43 research outputs found

    Characterizing the Communication Requirements of GNN Accelerators: A Model-Based Approach

    Get PDF
    Relational data present in real world graph representations demands for tools capable to study it accurately. In this regard Graph Neural Network (GNN) is a powerful tool, wherein various models for it have also been developed over the past decade. Recently, there has been a significant push towards creating accelerators that speed up the inference and training process of GNNs. These accelerators, however, do not delve into the impact of their dataflows on the overall data movement and, hence, on the communication requirements. In this paper, we formulate analytical models that capture the amount of data movement in the most recent GNN accelerator frameworks. Specifically, the proposed models capture the dataflows and hardware setup of these accelerator designs and expose their scalability characteristics for a set of hardware, GNN model and input graph parameters. Additionally, the proposed approach provides means for the comparative analysis of the vastly different GNN accelerators.Comment: ISCAS 202

    Understanding the Impact of On-chip Communication on DNN Accelerator Performance

    Full text link
    Deep Neural Networks have flourished at an unprecedented pace in recent years. They have achieved outstanding accuracy in fields such as computer vision, natural language processing, medicine or economics. Specifically, Convolutional Neural Networks (CNN) are particularly suited to object recognition or identification tasks. This, however, comes at a high computational cost, prompting the use of specialized GPU architectures or even ASICs to achieve high speeds and energy efficiency. ASIC accelerators streamline the execution of certain dataflows amenable to CNN computation that imply the constant movement of large amounts of data, thereby turning on-chip communication into a critical function within the accelerator. This paper studies the communication flows within CNN inference accelerators of edge devices, with the aim to justify current and future decisions in the design of the on-chip networks that interconnect their processing elements. Leveraging this analysis, we then qualitatively discuss the potential impact of introducing the novel paradigm of wireless on-chip network in this context.Comment: ICECS201

    WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing

    Full text link
    Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although HDC architectures based on IMC already exist, how to scale them remains a key challenge due to collective communication patterns that these architectures required and that traditional chip-scale networks were not designed for. To cope with this difficulty, we propose a scale-out HDC architecture called WHYPE, which uses wireless in-package communication technology to interconnect a large number of physically distributed IMC cores that either encode hypervectors or perform multiple similarity searches in parallel. In this context, the key enabler of WHYPE is the opportunistic use of the wireless network as a medium for over-the-air computation. WHYPE implements an optimized source coding that allows receivers to calculate the bit-wise majority of multiple hypervectors (a useful operation in HDC) being transmitted concurrently over the wireless channel. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. Through evaluations at the on-chip network and complete architecture levels, we demonstrate that WHYPE can bundle and distribute hypervectors faster and more efficiently than a hypothetical wired implementation, and that it scales well to tens of receivers. We show that the average error rate of the majority computation is low, such that it has negligible impact on the accuracy of HDC classification tasks.Comment: Accepted at IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS). arXiv admin note: text overlap with arXiv:2205.1088

    Optically Transparent Beam-Steering Reflectarray Antennas Based on Liquid Crystal for Millimeter-Wave Applications

    Get PDF
    This study presents a method to realize an optically transparent beam-steering antenna. The RF and optical features of Liquid Crystal (LC) technology are used in combination with transparent metal mesh to realize the first optically transparent reconfigurable reflectarray (RA). Since the electric field of bias and Radio Frequency (RF) signals are highly non-uniform, the LC permittivity is both anisotropic and inhomogeneous thus the behavior of LC molecules needs to be obtained for accurate modeling prior to antenna design. A unit cell consisting of metallic mesh and LC is analyzed and LC director distribution is obtained. The director data are transformed into permittivity tensors in the entire LC volume and the LC is discretized in electromagnetic simulation software to perform full-wave periodic boundary simulation to model the anisotropy and inhomogeneity. The discretized model is approximated by a single dielectric block with a new permittivity range for GT7 LC material. A 10Ă—10 RA is fabricated and measured in terms of optical and RF performance. The measured phase shift of the unit cell is 260Ëš when the voltage is increased from 0 V to 40 V. The measured beam scans from -10Ëš to 50Ëš in the E-plane and from -50 to +50 in the H-plane with a 14.35 dBi maximum gain. The prototype optical performance is also measured. The benefits and drawbacks of current RF LC mixtures are discussed. It shows that with an appropriate LC mixture optimized for both RF and optical transmission, the LC-based optically transparent antennas are a viable solution for various new applications

    StratoTrans : Unmanned Aerial System (UAS) 4G communication framework applied on the monitoring of road traffic and linear infrastructure

    Get PDF
    This study provides an operational solution to directly connect drones to internet by means of 4G telecommunications and exploit drone acquired data, including telemetry and imagery but focusing on video transmission. The novelty of this work is the application of 4G connection to link the drone directly to a data server where video (in this case to monitor road traffic) and imagery (in the case of linear infrastructures) are processed. However, this framework is appliable to any other monitoring purpose where the goal is to send real-time video or imagery to the headquarters where the drone data is processed, analyzed, and exploited. We describe a general framework and analyze some key points, such as the hardware to use, the data stream, and the network coverage, but also the complete resulting implementation of the applied unmanned aerial system (UAS) communication system through a Virtual Private Network (VPN) featuring a long-range telemetry high-capacity video link (up to 15 Mbps, 720 p video at 30 fps with 250 ms of latency). The application results in the real-time exploitation of the video, obtaining key information for traffic managers such as vehicle tracking, vehicle classification, speed estimation, and roundabout in-out matrices. The imagery downloads and storage is also performed thorough internet, although the Structure from Motion postprocessing is not real-time due to photogrammetric workflows. In conclusion, we describe a real-case application of drone connection to internet thorough 4G network, but it can be adapted to other applications. Although 5G will -in time- surpass 4G capacities, the described framework can enhance drone performance and facilitate paths for upgrading the connection of on-board devices to the 5G network

    Computing graph neural networks: A survey from algorithms to accelerators

    Get PDF
    Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data are inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has grown rapidly and has lead to the development of a variety of GNN algorithm variants as well as to the exploration of ground-breaking applications in chemistry, neurology, electronics, or communication networks, among others. At the current stage research, however, the efficient processing of GNNs is still an open challenge for several reasons. Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications. In this context, this article aims to make two main contributions. On the one hand, a review of the field of GNNs is presented from the perspective of computing. This includes a brief tutorial on the GNN fundamentals, an overview of the evolution of the field in the last decade, and a summary of operations carried out in the multiple phases of different GNN algorithm variants. On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.This work is possible thanks to funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 863337 (WiPLASH project) and the Spanish Ministry of Economy and Competitiveness under contract TEC2017-90034-C2-1-R (ALLIANCE project) that receives funding from FEDER.Peer ReviewedPostprint (published version

    Graphene-based wireless agile interconnects for massive heterogeneous multi-chip processors

    Get PDF
    The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communication demands. This position article makes the case for wireless in-package networking as the enabler of efficient and versatile wired-wireless interconnect fabrics for massive heterogeneous processors. To that end, the use of graphene-based antennas and transceivers with unique frequency-beam reconfigurability in the terahertz band is proposed. The feasibility of such a wireless vision and the main research challenges toward its realization are analyzed from the technological, communications, and computer architecture perspectives.This publication is part of the Spanish I+D+i project TRAINER-A (ref. PID2020-118011GB-C21), funded by MCIN/AEI/10.13039/501100011033. This work has been also supported by the European Commission under H2020 grants WiPLASH (GA 863337), 2D-EPL (GA 952792), and Graphene Flagship (GA 881603); the FLAGERA framework under grant TUGRACO (HA 3022/9-1, LE 2440/3-1), the European Research Council under grants WINC (GA 101042080), COMPUSAPIEN (GA 725657), and PROJESTOR (GA 682675), the German Ministry of Education and Research under grant GIMMIK (03XP0210) and the and the German Research Foundation under grant HIPEDI (WA 4139/1-1).Peer ReviewedArticle signat per 21 autors/es: Sergi Abadal, Robert Guirado, Hamidreza Taghvaee, and Akshay Jain are with the Universitat Politècnica de Catalunya, Spain; Elana Pereira de Santana and Peter Haring Bolívar are with the University of Siegen, Germany; Mohamed Saeed, Renato Negra, Kun-Ta Wang, and Max C. Lemme are with RWTH Aachen University, Germany. Zhenxing Wang, Kun-Ta Wang, and Max C. Lemme are also with AMO GmbH, Germany; Joshua Klein, Marina Zapater, Alexandre Levisse, and David Atienza are with the Swiss Federal Institute of Technology, Switzerland. Marina Zapater is also with the University of Applied Sciences and Arts Western Switzerland; Davide Rossi and Francesco Conti are with the University of Bologna,Italy; Martino Dazzi, Geethan Karunaratne, Irem Boybat, and Abu Sebastian are with IBM Research Europe, SwitzerlandPostprint (author's final draft

    Understanding the design-space of sparse/dense multiphase GNN dataflows on spatial accelerators

    Get PDF
    Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of recon-figurable dataflow (aka spatial) accelerators offers promise for acceleration by mapping optimized dataflows (i.e., computation order and parallelism) for both phases. The goal of this work is to characterize and understand the design-space of dataflow choices for running GNNs on spatial accelerators in order for mappers or design-space exploration tools to optimize the dataflow based on the workload. Specifically, we propose a taxonomy to describe all possible choices for mapping the dense and sparse phases of GNN inference, spatially and temporally over a spatial accelerator, capturing both the intra-phase dataflow and the inter-phase (pipelined) dataflow. Using this taxonomy, we do deep-dives into the cost and benefits of several dataflows and perform case studies on implications of hardware parameters for dataflows and value of flexibility to support pipelined execution.Parts of this work were supported through a fellowship by NEC Laboratories Europe, Project grant PID2020-112827GB-I00 funded by MCIN/AEI/ 10.13039/501100011033, RTI2018-098156-B-C53 (MCIU/AEI/FEDER,UE) and grant 20749/FPI/18 from FundaciĂłn SĂ©neca.Peer ReviewedPostprint (author's final draft
    corecore